A Comprehensive Chinese Thesaurus System and its Weighting Scheme
نویسندگان
چکیده
Semantic/conceptual knowledge can greatly help in the processing of Chinese information. A well designed thesaurus can comprehensively reveal various semantic relationships among diierent elements in the documents, thus serving as a critical tool in intelligent Chinese information processing system. In this research, we have designed a comprehensive Chinese thesaurus system which can be used in customized applications for the processing of Chinese information. Our objective is to make our thesaurus support ee-cient search of synonyms on subjects and keywords, of broader or narrower terms, and of related subjects and terms. In addition, the closeness of the interrelationships can be quantiied so that the thesaurus simulates human performance more closely.
منابع مشابه
The Effect of Weighted Term Frequencies on Probabilistic Latent Semantic Term Relationships
Probabilistic latent semantic analysis (PLSA) is a method of calculating term relationships within a document set using term frequencies. It is well known within the information retrieval community that raw term frequencies contain various biases that affect the precision of the retrieval system. Weighting schemes, such as BM25, have been developed in order to remove such biases and hence impro...
متن کاملCovering Ambiguity Resolution in Chinese Word Segmentation Based on Contextual Information
Covering ambiguity is one of the two basic types of ambiguities in Chinese word segmentation. We regard its resolution as equivalent to word sense disambiguation, and make use of the classical vector space model in information retrieval to formulate the contexts of ambiguous words. A variation form of TFIDF weighting is proposed and a Chinese thesaurus is additionally utilized to cope with data...
متن کاملA Novel Scheme for Improving Accuracy of KNN Classification Algorithm Based on the New Weighting Technique and Stepwise Feature Selection
K nearest neighbor algorithm is one of the most frequently used techniques in data mining for its integrity and performance. Though the KNN algorithm is highly effective in many cases, it has some essential deficiencies, which affects the classification accuracy of the algorithm. First, the effectiveness of the algorithm is affected by redundant and irrelevant features. Furthermore, this algori...
متن کاملA system for the retrieval of Italian broadcast news
This paper presents a prototype for the retrieval of Italian broadcast news, which has been developed at ITC-irst. The architecture employs a speech recognition engine for the automatic transcription of audio news. Moreover, it features document indexing based on part-of-speech tagging of text coupled with morphological analysis, and query expansion exploiting the Italian WordNet thesaurus. Que...
متن کاملWord Extraction Based on Semantic Constraints in Chinese Word-Formation
This paper presents a novel approach to Chinese word extraction based on semantic information of characters. A thesaurus of Chinese characters is conducted. A Chinese lexicon with 63,738 two-character words, together with the thesaurus of characters, are explored to learn semantic constraints between characters in Chinese word-formation, forming a semantic-tag-based HMM. The Baum-Welch re-estim...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007